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(57) Abstract: In summary, this application describes a psycho-acoustically motivated, parametric description of the spatial at- 
^2 tributes of multichannel audio signals. This parametric description allows strong bitrate reductions in audio coders, since only one 
monaural signal has to be transmitted, combined with (quantized) parameters which describe the spatial properties of the signal. The 
decoder can form the original amount of audio channels by applying the spatial parameters. For near-CD-quality stereo audio, a 
bitrate associated with these spatial parameters of 10 kbit/s or less seems sufficient to reproduce the connect spatial impression at the 
receiving end. 
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Spatial audio 



This invention relates to the coding of audio signals and, more particularly, the 
coding of multi-channel audio signals. 

5 Within the field of audio coding it is generally desired to encode an audio 

signal, e.g. in order to reduce tihe bit rate for communicating the signal or the storage 
requirement for storing the signal, without unduly compromising the perceptual quality of the 
audio signal. This is an important issue when audio signals are to be transmitted via 
communications chaimels of lindted capacity or when they are to be stored on a storage 

1 0 medium having a limited capacity. 

Prior solutions in audio coders that have been suggested to reduce the bitrate 
of stereo program material include: 

Intensity stereo \ Jn this algorithm, high frequencies (typically above 5 kHz) 
are r^resented by a single audio signal (i.e., mono), combined with time-varying and 

1 5 frequency-dependent scalefactors, 

'M/S stereo In this algorithm, the signal is decomposed into a sum (or mid, or 
common) and a difference (or side, or imcommon) signal. This decomposition is sometimes 
combined witii principle component analysis or time-varying scalefactors. These signals are 
then coded independently, either by a transform coder or waveform coder. The amount of 

20 information reduction achieved by this algorithm strongly depends on the spatial properties 
of the source signal. For example, if the source signal is monaural, the difference signal is 
zero and can be discarded. However, if the correlation of the left and right audio signals is 
low (which is often the case), this scheme offers only litfle advantage. 

Parametric descriptions of audio signals have gained interest during the last 

25 years, especially in the field of audio coding. It has been shown tiiat transmitting (quantized) 
parameters that describe audio signals requires only litfle transmission capacity to 
resyuthesize a perceptually equal signal at flie receiving end. However, current parametric 
audio coders focus on coding monaural signals, and stereo signals are often processed as dual 
mono. 
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European patent application EP 1 107 232 discloses a metbod of encoding a 
stereo signal having an L and an R component, where the stereo signal is represented by one 
of the stereo components and parametric information capturing phase and level dififerences 
of the audio signal. At the decoder, the other stereo component is recovered based on tiie 
encoded stereo component and the parametric information. 

It is an object of the present invention to solve the problem of providing an 
improved audio coding that yields a high perceptual quality of the recovered signal. 



The above and other problems are solved by a method of coding an audio 
signal, the method comprising: 

- generating a monaural signal comprising a combination of at least two input audio 
channels, 

- determining a set of spatial parameters indicative of spatial properties of the at least two 
input audio chaimels, the set of spatial parameters including a parameter representing a 
measure of similarity of waveforms of the at least two input audio channels, and 

- generating an encoded signal comprising the monaural signal and the set of spatial 
parameters. 

It has been realized by the inventor that by encoding a multi-channel audio 
signal as a monaural audio signal and a number of spatial attributes comprising a measure of 
similarity of the corresponding waveforms, the multi-channel signal may be recovered with a 
high perceptual quality. It is a further advantage of the invention that it provides an efficient 
encoding of a multi-chaimel signal, i.e. a signal comprising at least a first and second 
channel, e.g. a stereo signal, a quadraphonic signal, etc. 

Hence, according to an aspect of the invention, spatial attributes of multi- 
channel audio signals are parameterized. For general audio coding applications, transmitting 
these parameters combined with only one monaural audio signal strongly reduces the 
transmission capacity necessary to transmit the stereo signal compared to audio coders that 
process the channels independently, while maintaining the original spatial impression. An 
important issue is ttiat although people receive waveforms of an auditory object twice (once 
by the left ear and once by the right ear), only a single auditory object is perceived at a 
certain position and witii a certain size (or spatial diffuseness). 

Therefore, it seems xmnecessary to describe audio signals as two or more 
(independent) waveforms and it would be better to describe mvilti-channel audio as a set of 
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auditory objects, each with its own spatial properties. One difficulty that immediately arises 
is the fact that it is almost impossible to automatically separate individual auditory objects 
firom a given ensemble of auditory objects, for example a musical recording. This problem 
can be circumvented by not splitting the program material in individual auditory objects, but 
5 rather describmg tiie spatial parameters in a way that resembles the effective (peripheral) 
processing of the auditory system. When the spatial attributes comprise a measure of 
(dis)similari1y of the corresponding wavefomis, an efficient coding is achieved while 
maintaining a high level of perceptual quality. 

In particular, the parametric description of multi-channel audio presented here 

10 is related to the binaural processing model presented by Breebaart et al. This model aims at 
describing the effective signal processing of the binaural auditory system. For a description 
of the binaural processing model by Breebaart et al., see Breebaart, J,, van de Par, S. and 
Kohlraxisch, A. (2001a). Binaural processing model based on contralateral inhibition. I. 
Model setup. J Acoust Soc. Am., 110, 1074-1088; Breebaart, J., van de Par, S. and 

15 Kohhauschi A, (2001b). Binaural processmg model based on contralateral inhibition, n. 

Dependence on spectral parameters J, Acoust Soc. Am,, 110, 1089-1104; and Breebaart, J., 
van de Par, S, and Kohlrausch, A. (2001c). Binaural processing model based on contralateral 
inhibition, m. Dependence on temporal parameters.. J. Acoust Soc. Am., 110, 1 105-1 1 17. A 
short interpretation is given below which helps to understand the invention. 

20 In a preferred embodiment, the set of spatial parameters includes at least one 

localization cue. When the spatial attributes comprise one or more, preferably two, 
localization cues as well as a measure of (dis)sinadlarity of the corresponding waveforms, a 
particularly efficient coding is achieved while maintaining a particularly high level of 
perceptual quality. 

25 The term localization cue comprises any suitable parameter conveying 

information about the localization of auditory objects contributing to the audio signal, e.g. 
the orientation of and/or the distance to an auditory object 

In a preferred embodiment of the invention, the set of spatial parameters 
includes at least two localization cues comprising an interchannel level difference (ILD) and 

30 a selected one of an interchannel time difference (TTD) and an interchannel phase difference 
(IPD), It is interesting to mention that tiie interchannel level difference and the interchannel 
time difference are considered to be the most important localization cvies in the horizontal 
plane. 
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The measure of sixnilarity of the waveforms corresponding to the first and 
second audio channels may be any suitable function describing how similar or dissimilar the 
corresponding waveforms are. Hence, the measure of similarity may be an increasing 
function of similarity, e.g. a parameter determined fi»m to the interchannel cross-coirelation 
5 (function). 

According to a preferred embodiment, the measure of similarity corresponds 
to a value of a cross-correlation function at a maximum of said cross-correlation fiinction 
(also known as coherence). The maximum interchannel cross-correlation is strongly related 
to ibe perceptual spatial diffuseness (or compactness) of a soimd source, i.e. it provides 
1 0 additional information which is not accounted for by the above localization cues, thereby 
providing a set of parameters with a low degree of redundancy of the information conveyed 
by them and, thus, providing an efficient coding. 

It is noted that, alternatively, other measures of similarity may be used, e.g. a 
fiinction mcreasing with the dissimilarity of the waveforms. An example of such a function is 
15 1-c, where c is a cross-correlation that may assume values between 0 and 1 . 

Accordiag to a preferred embodiment of the invention, the step of determining 
a set of spatial parameters indicative of spatial properties comprises determining a set of 
spatial parameters as a function of time and fi«quency. 

It is an insight of the inventors that it is sufficient to describe spatial attributes 
20 of any multichannel audio signal by specifying the ILD, ITD (or IPD) and the maximum 
correlation as a function of time and fi«quency. 

In a furttier preferred embodiment of the invention, the step of determining a 
set of spatial parameters indicative of spatial properties comprises 

- dividing each of the at least two ii^ut audio channels into corresponding pluralities of 
25 fi«quency bands; 

- for each of the plurality of firequency bands detamining the set of spatial parameters 
indicative of spatial properties of tiie at least two input audio channels within the 
corresponding firequency band. 

Hence, the incoming audio signal is split into several band-limited signals, 
30 which are (preferably) spaced linearly at an ERB-rate scale. Preferably the analysis filters 
show a partial overlap in the fi»quency and/or time domaiiL The bandwidth of these signals 
depends on the crater fi»quency, following tiie ERB rate. Subsequenfiy, preferably every 
frequency band, the following properties of the mcoming signals are analyzed: 
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- The interchannel level difference, or ILD, defined by the relative levels of the band- 
limited signal stemming firom the left and right signals, 

- The interchannel time (or phase) difference QTD or IPD), defined by the interchannel 
delay (or phase shift) corresponding to the position of the peak in flie interchannel cross- 
correlation function, and 

- The (dis)similarity of the waveforms that can not be accounted for by ITDs or BLDs, 
which can be parameterized by the maximum interchannel cross-correlation (i.e., the 
value of the normalized cross-correlation function at the position of flie m a x imum peak, 
also known as coherence). 

The three parameters described above vary over time; however, since the 
binaural auditory system is very sluggish in its processing, the update rate of these properties 
is rather low (typically tens of milliseconds). 

It may be assumed here that the (slowly) time-varying properties mentioned 
above are the only spatial signal properties that the binaural auditory system has available, 
and that firom these time and firequency dependent parameters, the perceived auditory world 
is reconstmcted by higher levels of the auditory system. 

An embodiment of the current invention aims at describing a multichannel 
audio signal by: 

one monaural signal, consisting of a certain combination of the input signals, 

and 

a set of spatial parameters: two localization cues (ILD, and ITD or IPD) and a 
parameter that describes the similarity or dissimilarity of the waveforms that cannot be 
accounted for by ILDs and/or ITDs (e.g., the maximum of the cross-correlation fimction) 
preferably for every time/firequency slot. Preferably, spatial parameters are included for each 
additional auditory channel. 

An important issue of transmission of parameters is the accuracy of the 
parameter representation (i.e., the size of quantization errors), which is directiy related to the 
necessary transmission capacity. 

According to yet another preferred embodiment of the invention, the step of 
generating an encoded signal comprising the monaural signal and the set of spatial 
parameters comprises generating a set of quantized spatial parameters, each introducuig a 
corresponding quantization error relative to the corresponding determined spatial parameter, 
wherein at least one of tiie introduced quantization errors is controlled to depend on a value 
of at least one of the determined spatial parameters. 
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Hence, Hie quantization error introduced by the quantization of the parameters 
is controlled according to the sensitivity of the human auditory system to changes in these 
parameters. This sensitivity strongly depends on the values of the parameters itself. Hence, 
by controlling the quantization error to depend on the values of the parameters, and improved 
S encoding is achieved. 

It is an advantage of Hie invention that it provides a decoupling of monaural 
and binaural signal parameters in audio coders. Hence, dif&culties related to stereo audio 
coders are strongly reduced (such as the audibility of interaurally uncorrelated quantization 
noise compared to interaurally correlated quantization noise, or interaural phase 
.0 inconsistencies in parametric coders that are encoding in dual mono mode). 

It is a further advantage of the invention that a strong bitrate reduction is 
achieved in audio coders due to a low update rate and low frequency resolution required for 
the spatial parameters. The associated bitrate to code the spatial parameters is typically 10 
kbit/s or less (see the embodiment described below). 
.5 It is a further advantage of the invention that it may easily be combined with 

existing audio coders. The proposed scheme produces one mono signal that can be coded and 
decoded with any existing coding strategy. After monaural decoding, the system described 
here regen^ates a stereo multichannel signal with the ^propriate spatial attributes. 

The set of spatial parameters can be used as an enhancement layer in audio 
10 coders. For example, a mono signal is transmitted if only a low bitrate is allowed, while by 
including the spatial enhancement layer the decoder can reproduce stereo sound. 

It is noted that the invention is not limited to stereo signals but may be applied 
to any multi-channel signal comprising n channels (n>l). In particular, the invention can be 
used to generate n channels from one mono signal, if (n-1) sets of spatial parameters are 
25 transmitted. In this case, the spatial parameters describe how to form the n different audio 
channels from the single mono signal. 

The present invention can be implemented in different ways including the 
method described above and in the following, a method of decoding a coded audio signal, an 
encoder, a decoder, and further product means, each yielding one or more of the benefits and 
30 advantages described in connection with die jBrst-mentioned method, and each having one or 
more preferred embodiments corresponding to the preferred embodiments described in 
connection with the first-mentioned method and disclosed in the dependant claims. 

It is noted tiiat the features of the method described above and in the following 
may be implemented in software and carried out in a data processing system or other 
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processing means caused by the execution of computer-executable instructiojos. The 
instructions may be program code means loaded in a memory, such as a RAM, from a storage 
medimn or from another computer via a computer network. Alternatively, the described 
features may be implemented by hardwired circuitry instead of software or in combination 
5 with software. 

The invention further relates to an encoder for coding an audio signal, the 
encoder comprising: 

- means for generating a monaural signal comprising a combination of at least two input 
audio channels, 

10 - means for determining a set of spatial parameters indicative of spatial properties of titxe at 
least two input audio channels, the set of spatial parameters including a parameter 
representing a measure of similarity of waveforms of the at least two input audio 
channels, and 

- means for generating an encoded signal comprising the monaural signal and die set of 
L 5 spatial parameters. 

It is noted that the above means for generating a monaural signal, the means 
for determining a set of spatial parameters as well as means for generating an encoded signal 
naay be implemented by any suitable circuit or device, e.g. as general- or special-purpose 
programmable microprocessors. Digital Signal Processors (DSP), Application Specific 
>0 Integrated Circuits (ASIC), Programmable Logic Arrays (PLA), Field Programmable Gate 
Arrays (FPGA), special purpose electronic circuits, etc., or a combination thereof. 

The invention further relates to an apparatus for supplying an audio signal, the 
apparatus comprising: 

- an input for receiving an audio signal, 

Z5 - an encoder as described above and in the following for encoding the audio signal to 
obtain an encoded audio signal, and 

- an output for supplying the encoded audio signal. 

The apparatus may be any electronic equipment or part of such equipment, 
such as stationary or portable computers, stationary or portable radio communication 
JO equipment or other handheld or portable devices, such as media players, recording devices, 
etc. The term portable radio communication equipment includes all equipment such as 
mobile telephones, pagers, communicators, i.e. electronic organizei^, smart phones, personal 
digital assistants (PDAs), handheld computers, or the like. 
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The input may comprise any suitable circuitry or device for receiving a multi- 
channel audio signal in analogue or digital form, e.g. via a wired connection, such as a line 
jack, via a wireless connection, e.g. a radio signal, or in any other suitable way. 

Similarly, the output may comprise any suitable circuitry or device for 
5 supplying the encoded signal. Examples of such outputs include a netv^ork interface for 
providing the signal to a computer network, such as a LAN, an Internet, or the like, 
conmaunications circuitry for coramunicating the signal via a communications chaimel, e.g. a 
wireless communications chaimel, etc. In other embodiments, the output may comprise a 
device for storing a signal on a storage medium. 
LO The invention further relates to an encoded audio signal , the signal 

comprising: 

- a monaural signal comprising a combination of at least two audio channels, and 

- a set of spatial parameters indicative of spatial properties of the at least two input audio 
channels, the set of spatial parameters including a parameter representing a measure of 

15 similarity of waveforms of the at least two input audio channels. 

The invention further relates to a storage medium having stored thereon such 
an encoded signal. Here, the term storage medium comprises but is not limited to a magnetic 
tape, an optical disc, a digital video disk (DVD), a compact disc (CD or CD-RON^, a mini- 
disc, a hard disk, a floppy disk, a foro-electric memory, an electrically erasable 

20 programmable read only memory (EEPROM), a flash memory, an EPROM, a read only 
memory (ROM), a static random access memory (SRAM), a dynamic random access 
memory (DRAM), a synchronous dynamic random access memory (SDRAM), a 
ferromagnetic memory, optical storage, charge coupled devices, smart cards, a PCMCIA 
card, etc. 

25 The invention further relates to a method of decoding an encoded audio signal, 

the method comprising: 

- obtaining a monaural signal from the encoded audio signal, the monaural signal 
comprising a combination of at least two audio channels, 

- obtaining a set of spatial parameters from the encoded audio signal, the set of spatial 

30 parameters including a parameter representing a measure of similarity of waveforms of 
the at least two audio chaimels, and 

- generating a multi-channel output signal from the monaural signal and the spatial 
parameters. 
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The invention further relates to a decoder for decoding an encoded audio 
signal, the decoder comprising 

- means for obtaining a monaural signal fiom the encoded audio signal, the monaural 
signal comprising a combination of at least two audio channels, 

- means for obtaining a set of spatial parameters from the encoded audio signal, the set of 
spatial parameters including a parameter representing a measxire of similarity of 
waveforms of the at least two audio channels, and 

- means for generating a midti-channel output signal from the monaural signal and the 
spatial parameters. 

It is noted that the above means may be implemented by any suitable circuit or 
device, e.g. as general- or special-purpose programmable microprocessors. Digital Signal 
Processors (DSP), Application Specific Integrated Circuits (ASIC), Programmable Logic 
Arrays (PLA), Field Programmable Gate Arrays (FPGA), special purpose electronic circuits, 
etc, or a combination thereof. 

The invention further relates to an apparatus for supplying a decoded audio 
signal , the apparatus comprising: 

- an input for receiving an encoded audio signal, 

- a decoder as described above and in the following for decoding the encoded audio signal 
to obtain a multi*chaimel output signal, 

- an output for supplying or reproducing the multi-channel output signal . 

The apparatus may be any electronic equipment or part of such equipment as 
described above. 

The input may comprise any suitable circuitry or device for receiving a coded 
audio signal. Examples of such inputs include a network interface for receiving the signal via 
a computer network, such as a LAN, an Internet, or the like, communications circxutry for 
receiving the signal via a commimications channel, e.g. a wireless communications chaimel, 
etc. In other embodiments, the input may comprise a device for reading a signal from a 
storage medixmi. 

Similarly, the output may comprise any suitable circuitry or device for 
suppljdng a miilti-channel signal in digital or analogue fonn. 



These and other aspects of the invention will be apparent and elucidated from 
the embodiments described in flie following with refaence to the drawing in which: 
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fig. 1 shows a flow diagram of a method of encoding an audio signal 
according to an embodiment of the invention; 

fig. 2 shows a schematic block diagram of a coding system according to an 
embodiment of the invention; 

fig. 3 illustrates a filter method for use in the synthesizing of the audio signal; 

and 

fig. 4 illustrates a decorrelator for use in the synthesizing of the audio signal. 



Fig. 1 shows a flow diagram of a method of encoding an audio signal 
according to an embodiment of the invention. 

In an initial step SI, the incoming signals L and R are split up in band-pass 
signals (preferably with a bandwidth which increases with firequency), indicated by reference 
nimieral 101, such that their parameters can be analyzed as a function of time. One possible 
method for time/firequency slicing is to use time-windowing followed by a transform 
operation, but also time-continuous methods could be used (e.g., filterbanks). The time and 
firequency resolution of this process is preferably adapted to the signal; for transient signals a 
fine time resolution (in the order of a few milliseconds) and a coarse firequency resolution is 
preferred, v^e for non-transient signals a finer firequency resolution and a coarser time 
resolution (in the order of tens of milliseconds) is preferred. Subsequentiy, in step S2, the 
level difference (ELD) of corresponding subband signals is determined; in step S3 tiie time 
difference (ITD or IPD) of corresponding subband signals is determined; and in step S4 the 
amount of similarity or dissimilarity of the waveforms which cannot be accounted for by 
ELDs or ITDs, is described. The analysis of these parameters is discussed below. 

Step S2: Analysis of ILDs 

The ELD is determined by the level difference of the signals at a certain tune 
instance for a given firequency band. One method to determine the ELD is to measure the root 
mean square (rms) value of the corresponding firequency band of both input chaimels and 
compute the ratio of these rms values (preferably e?q)ressed in dB). 

Step S3: Analvsis of the ITDs 

The ITDs are determined by the time or phase alignment which ^ves the best 
match between the waveforms of botii channels. One method to obtain the ITD is to compiite 
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the cross-correlation function between two corresponding subband signals and searching for 
the maximum. The delay that corresponds to this maximum in the cross-correlation function 
can be used as ITD value. A second method is to compute the analytic signals of the left and 
right subband (i.e., computing phase and envelope values) and use the (average) phase 
difference between the channels as IPD parameter. 

Step S4: Analysis of the correlation 

The correlation is obtained by first finding the ELD and ITD that gives the best 
match between the corresponding subband signals and subsequently measuring the similarity 
of the waveforms after compensation for the ITD and/or ILD. Thus, in this fi:amework, the 
correlation is defined as the similarity or dissimilarity of corresponding subband signals 
which can not be attributed to ILDs and/or TTDs, A sviitable measure for this parameter is the 
maximum value of the cross-correlation function (i,e,, the maximum across a set of delays). 
However, also other measures could be used, such as the relative energy of the difference 
signal after ILD and/or ITD compensation compared to the sum signal of corresponding 
subbands (preferably also compensated for ILDs and/or ITDs). This difference parameter is 
basically a linear transformation of the (maximum) correlation. 

In the subsequent steps S5, S6, and S7, tiie determined parameters are 
quantized. An important issue of transmission of parameters is the accuracy of the parameter 
representation (i.e., the size of quantization errors), which is directly related to the necessary 
transnoission capacity. In this section, several issues with respect to the qua nt iza t ion of the 
spatial parameters will be discussed. The beisic idea is to base the quantization errors on so- 
called Just-noticeable differences (JNDs) of the spatial cues. To be more specific, tiie 
quantization error is determined by the sensitivity of the human auditory system to changes in 
the parameters. Since the sensitivity to changes in the parameters strongly depends on the 
values of the parameters itself, we apply the following methods to determine the discrete 
quantization steps. 

Step S5: Quantization of ILDs 

It is known firom psychoacoustic research that the sensitivity to changes in the 
ILD depends on the ILD itself. If the ILD is e3q)ressed m dB, deviations of approximately 1 
dB firom a reference of 0 dB are detectable, while changes in the order of 3 dB are required if 
tiie refCTence level diffCTcnce amounts 20 dB. Therefore, quantization errors can be larger if 
the signals of the left and right channels have a ledger level difference. For example, fliis can 
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be applied by first measuring the level difference between the channels, followed by a non- 
linear (compressive) transformation of the obtained level difference and subsequently a linear 
quantization process, or by using a lookup table for the available ILD values which have a 
nonlinear distribution. The embodiment below gives an example of such a lookup table. 

5 

Step S6: Quantization of the ITDs 

The sensitivity to changes in the ITDs of hmnan subjects can be characterized 
as having a constant phase threshold. This means that in terms of delay times, the 
quantization steps for the ITD should decrease with frequency. Alternatively, if the ITD is 

10 represented in the form of phase differences, the quantization steps should be independent of 
frequency. One method to implement this is to take a fixed phase difference as quantization 
step and determine the corresponding time delay for each frequency band. This ITD value is 
then used as quantization step. Another method is to transmit phase differences which follow 
a firequency-independent quantization scheme. It is also known that above a certain 

1 5 frequency, the human auditory system is not sensitive to ITDs in the finestructure 

waveforms. This phenomenon can be exploited by only transmitting ITD parameters up to a 
certain fi^quency (typically 2 kHz). 

A third method of bitstream reduction is to incorporate ITD quantization steps 
that depend on the ILD and /or the correlation parameters of the same subband. For large 

20 ILDs, the ITOs can be coded less accurately. Furthermore, if the correlation it very low, it is 
known that the human sensitivity to changes in the ITD is reduced. Hence larger ITD 
quantization errors may be applied if the correlation is small. An extreme example of this 
idea is to not transmit ITDs at all if the correlation is below a certain threshold and/or if the 
ILD is sufficiently large for the same subband (typically around 20 dB), 

25 

Step R7' Qua ntization of the correlation 

The quantization error of the correlation depends on (1) the correlation value 
itself and possibly (2) on the ILD. Correlation values near +1 are coded with a high accuracy 
(i.e., a small quantization step), while correlation values near 0 are coded with a low accuracy 
30 (a large quantization step). An example of a set of non-linearly distributed correlation values 
is given in the embodiment A second possibility is to use quantization steps for the 
correlation that depend on the measured ILD of the same subband: for large ILDs (i.e., one 
chaimel is dominant in terms of energy), the quantization errors in the correlation become 
larger. An extreme example of this principle would be to not transmit correlation values for a 
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certain subband at all if the absolute value of the ILD for that subband is beyond a certain 
threshold. 

In step S8, a monaural signal S is generated from the incoming audio signals, 
e.g. as a sum signal of the incoming signal components, by determining a dominant signal, by 
generating a principal component signal from the incoming signal components, or the like. 
This process preferably uses the extracted spatial parameters to generate the mono signal, i.e., 
by first aligning the subband waveforms using the 11 D or IPD before combination. 

Finally, in step S9, a coded signal 102 is generated from the monaural signal 
and the determined parameters. Alternatively, the sum signal and the spatial parameters may 
be commxmicated as separate signals via the same or different channels. 

It is noted that the above method may be implemented by a corresponding 
arrangement, e.g. implemented as general- or special-purpose programmable 
microprocessors. Digital Signal Processors (DSP), Application Specific Integrated Circuits 
(ASIC), Programmable Logic Arrays (PLA), Field Programmable Gate Arrays (FPGA), 
special purpose electronic circuits, etc., or a combination thereof. 

Fig. 2 shows a schematic block diagram of a coding system according to an 
embodiment of the invention. The system comprises an encoder 201 and a corresponding 
decoder 202. The decoder 201 receives a stereo signal with two components L and R and 
generates a coded signal 203 comprising a sum signal S and spatial parameters P which are 
communicated to the decoder 202. The signal 203 may be communicated via any suitable 
commimications chaimel 204. Alternatively or additionally, the signal may be stored on a 
removable storage medimn 214, e.g. a memory card, which may be transferred from the 
encoder to the decodo:. 

The encoder 201 comprises analysis modules 205 and 206 for analyzing 
spatial parameters of the incoming signals L and R, respectively, preferably for each 
time/frequency slot. The encoder further comprises a parameter extmction module 207 that 
generates qxxantized spatial parameters; and a combiner modxile 208 that generates a sum (or 
dominant) signal is consisting of a certain combination of the at leeist two input signals. The 
encoder further comprises an encoding module 209 which generates a resultuig coded signal 
203 comprising the monaural signal and the spatial parameters. In one embodiment, the 
module 209 further performs one or more of the following functions: bit rate allocation, 
framing, lossless coding, etc. 

Synthesis (in the decoder 202) is performed by applying tiie spatial parameters 
to tiie sum signal to generate left and right output signals. Hence, the decoder 202 comprises 
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a decoding modxile 210 which performs the inverse operation of module 209 and extracts the 
sum signal S and the parameters P from the coded signal 203. the decoder further comprises a 
synthesis module 21 1 which recovers the stereo components L and R from the sum (or 
dominant) signal and the spatial parameters. 

In this embodiment, the spatial parameter description is combined with a 
monaural (single channel) audio coder to encode a stereo audio signal. It should be noted that 
altiiough the desoribed embodimrat works on stereo signals, the general idea can be applied 
to n-channel audio signals, with n>l. 

In the analysis modules 205 and 206, the left and right incoming signals L and 
respectively, are split up in various time frames (e.g. each comprising 2048 samples at 
44.1 kHz sampling rate) and windowed with a square-root Hanning window. Subsequentiy, 
FFTs are computed. The negative FFT frequencies are discarded and the resulting FFTs are 
subdivided into groups (subbands) of FFT bins. The number of FFT bins that are combined in 
a subband g depends on the frequency: at higher frequencies more bins are combined than at 
lower frequencies. In one embodiment, FFT bins corresponding to approximately L.8 ERBs 
(Equivalent Rectangular Bandwidth) are grouped, resulting in 20 subbands to represent the 
entire audible frequency range. The resulting number of FFT bins S[g\ of each subsequent 
subband (starting at the lowest frequency) is 

S=[4 4 4 5 6 8 9 12 13 17 21 25 30 38 45 55 68 82 100 
477] 

Thus, the first three subbands contain 4 FFT bins, the fourth subband contains 
5 FFT bms, etc. For each subband, the corresponding ILD, ITD and correlation (r) are 
computed. The ITD and correlation are computed simply by setting all FFT bins which 
belong to other groups to zero, multiplsdng the resulting (band-limited) FFTs from the left 
and right channels, followed by an inverse FFT transform. The resulting cross-correlation 
function is scanned for a peak within an interchannel delay between -64 and +63 samples. 
The internal delay corresponding to the peak is used as ITD value, and the value of the cross- 
correlation function at this peak is used as this subband' s interchaimel correlation. Finally, 
the ILD is simply computed by taking the power ratio of the left and right channels for each 
subband. 

In the combiner modide 208, the left and right subbands are summed after a 
phase correction (temporal alignment). This phase correction follows from the computed ITD 
for that subband and consists of delaying the left-channel subband with FTD^ and the right- 
channel subband with -ITD/2. The delay is performed in the frequency domain by 
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appropriate modification of the phase angles of each FFT bin. Subsequently, the sum signal is 
computed by adding the phase-modified versions of the left and right subband signals. 
Finally, to compensate for uncorrelated or correlated addition, each subband of the sum 
signal is multiplied with sqrt(2/(l+r)), with r the correlation of the corresponding subband. If 
5 necessary, the sum signal can be converted to the time domain by (1) inserting complex 
conjugates at negative frequencies, (2) inverse FFT, (3) windowing, and (4) overlap-add. 

In the parameter extraction module 207, the spatial parameters are quantized. 
ELDs (in dB) are quantized to the closest value out of the following set I: 
I=[.19 -.16 -13 -10-8-6^-2024681013 16 19] 
10 ITD quantization steps are determined by a constant phase difference in each 

subband of 0.1 rad. Thus, for each subband, the time difference that corresponds to 0.1 rad of 
the subband center frequency is used as quantization step. For frequencies above 2 kHz, no 
ITD information is transmitted, 

Interchannel correlation values r are quantized to the closest value of the 
1 5 following ensemble R: 

R=[\ 0.95 0.9 0.82 0.75 0.6 0,3 0] 

This will cost another 3 bits per correlation value. 

If the absolute value of the (quantized) ILD of the current subband amounts 19 
dB, no ITD and correlation values are transmitted for this subband. If the (quantized) 
20 correlation value of a certain subband amounts zero, no ITD value is transmitted for that 
subband. 

In this way, each frame requires a maximum of 233 bits to transmit the spatial 
parameters. With a framelength of 1024 frames, the maximum bitrate for transmission 
amounts 10.25 kbit/s. It should be noted that using entropy coding or differential coding, this 

25 bitrate can be reduced fiirther. 

The decoder comprises a synthesis module 211 where the stereo signal is 
synthesized form the received sum signal and the spatial parameters. Hence, for the purpose 
of this description it is assumed that the synthesis module receives a frequency-domain 
representation of the sum signal as described above. This representation may be obtained by 

30 windowing and FFT operations of the time-domain waveform. First, the sum signal is copied 
to the left and ri^t output signals. Subsequentiy, the correlation between the left and right 
signals is modified with a decorrelator. In a prefwred embodiment, a decorrelator as 
described below is used. Subsequentiy, each subband of the left signal is delayed by -ITD^, 
and the right signal is delayed by ITD/2 given the (quantized) ITD corresponding to that 
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subband. Finally, the left and right subbands are scaled according to the ILD for that 
subband. In one embodiment, the above modification is performed by a filter as described 
below. To convert the output signals to the time domain, the following steps are performed: 
(1) mserting complex conjugates at negative firequencies, (2) inverse FFT, (3) windowing, 
and (4) overlap-add. 

Fig. 3 illustrates a filter method for use in the synthesizing of the audio signal. 
In an initial step 301, the incoming audio signal x(t) is segmented into a number of firames. 
The segmentation step 301 splits the signal into frames Xn(t) of a suitable length, for example 
in the range 500-5000 samples, e.g. 1024 or 2048 samples. 

Preferably, the segmentation is performed using overlapping analysis and 
synthesis window functions, thereby suppressing artefacts which may be introduced at the 
frame boundaries (see e.g. Princen, J. P., and Bradley, A. B.: "Analysis/synthesis filterbank 
design based on time domain aliasing cancellation", IEEE transactions on Acoustics, Speech 
and Signal processing. Vol. ASSP 34, 1986). 

Ill step 302, each of the fimies Xn(t) is transformed into the frequency domain 
by applying a Fourier transformation, preferably implemented as a Fast Fourier Transform 
(FFT). The resulting frequency representation of Ihe n-th frame Xn(t) comprises a number of 
frequency components X(k,n) where the parameter n indicates the frame number and the 
parameter k indicates the fi:equency component or frequency bin corresponding to a 
firequency cok, 0<k<K. In general, the frequency domain components X(k,n) are complex 
numbers. 

In step 303, the desired filter for the current frame is determined according to 
the received time-varying spatial parameters. The desired filter is expressed as a desired filter 
response comprising a set of K complex weight factors F(k,n), 0<k<K, for the n-th fitime. 
The filter response F(k,n) may be represented by two real numbers, i.e. its amplitude aQ^^a) 
and its phase 9(k,n) according to F(k,n) = aOc,n) • exp[j (p(k,n)]. 

In the frequency domain, tiie filtered frequency components are Y(k,n) = 
F(k,n) • XOc,n), i.e. tiiey result from a multiplication of the frequency components X(k,n) of 
the mput signal with the filter response FOc»n). As will be apparent to a skilled person, tiiis 
multiplication in the frequency domain corresponds to a convolution of the input signal firame 
Xn(t) with a corresponduxg filter 5i(t). 

In step 304, the desired filter response F(k,n) is modified before applying it to 
the cxjrrent firame X(k,n). In particular, the actual filter response F'(k,n) to be applied is 
determined as a fimction of the desired filter response F(k;a) and of information 308 about 
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previous frames. Preferably, this information comprises the actual and/or desired filter 
response of one or more previous frames, according to 

F'(k,n) = a'(k,n) • expQ 

= 0[F(k,n), F(k,n.l), F(k,n-2)„..,F(M-1), F'(k,n-2),...]. 

Hence, by making the actual filter response dependant of the history of 
previous filter responses, artifacts introduced by changes m the filter response between 
consecutive firames may be eflBciently suppressed. Preferably, the actual form of the 
transform ftinction O is selected to reduce overlap-add artifacts resulting from dynamically- 
varying filter responses. 

For example, the transform frmction <I> may be a fimction of a single previous 
response fimction, e.g. F'(k,n) = Oi[F(k,n), F(k,n-1)] or F'(k,n) = OajTOcA), F'(k,n-1)]. In 
another embodiment, the transform fimction may comprise a floating average over a number 
of previous response fimctions, e.g. a filtered version of previous response functions, or the 
like. Preferred embodiments of the transform fimction O will be described in greater detail 
below. 

In step 305, the actual filter response F'(k,n) is applied to the current firame by 
multiplying the frequency components X(k,n) of the current frame of the input signal with 
the corresponding filter response factors F'(k,n) according to Y(k,n) = F'(k,n) • X(k,n). 

In step 306, the resulting processed frequency components Y(k,n) are 
transformed back into the time domain resulting in filtered frames yn(t). Preferably, the 
inverse transform is implemented as an Inverse Fast Fourier Transform (IFFT). 

Finally, in step 307, the filtered firames are recombined to a filtered signal y(t) 
by an overlap-add method. An ef&cient implementation of such an overlap add method is 
disclosed in Bergmans, J. W. M.: "Digital baseband transmission and recording", Kluwer, 
1996. 

In one embodiment, the transform fimction O of step 304 is intiplemented as a 
phase-change limiter between the current and the previous frame. According to this 
embodiment, the phase change 8(k) of each frequency component F(k,n) compared to the 
actual phase modification (p'(kji-l) appUed to the previous sample of the corresponding 
frequency component is computed, i.e. 5(k) = q)(k,n) - 9'(kji-l). 
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Subsequently, the phase component of the desired filter F(k,n) is modified in 
such a way that the phase change across frames is reduced, if the change would result in 
overlap-add artifacts. According to this embodiment, this is achieved by ensuring that the 
actual phase difference does not exceed a predetemMned threshold c, e.g. by simply cutting of 
the phase difference, according to 

{F(k,n\ if\Sik)\<c 
\f' {k, n - l).e^•"•''«"^^^*^^o^/lenvfce 

The threshold value c may be a predetermined constant, e.g. between tt/S and 
7c/3 rad. In one embodiment, the threshold c may not be a constant but e.g. a function of time, 
frequency, and/or the like. Furthermore, alternatively to the above hard limit for the phase 
change, other phase-change-limiting functions may be used. 

In general, in the above embodiment, the desired phase-change across 
subsequent time frames for individual frequency components is transformed by an input- 
ou^ut function P(5(k)) and the actual filter response F'(k,n) is given by 

F'(k,n) = F*(k,n-1) expO P(5(k))]. (2) 

Hence, according to this embodiment, a transform function P of the phase 
change across subsequent time frames is introduced. 

In another embodiment of the transformation of the filter response, the phase 
l imiting procedure is driven by a suitable measure of tonality, e.g. a prediction method as 
described below. This has tiie advantage that phase jumps between consecutive frames v^ch 
occur in noise-like signals may be excluded from the phase-change Hmiting procedure 
according to the invention. This is an advantage, since limiting such phase jumps in noise 
like signals would make the noise-like signal sound more tonal which is often perceived as 
synthetic or metallic. 

According to this embodiment, a predicted phase error 9(k) = 9(k,n) - (p(k,n-l) 
- CDk • h is calculated. Here, cok denotes the frequency corresponding to the k-th frequency 
component and h denotes the hop size in samples. Here, the term hop size refers to the 
difference between two adjacent window centers, i.e. half the analysis length for symmetric 
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windows. In the following, it is assumed that the above error is wrapped to the interval [- 

Subsequently, a prediction measure Pk for the amount of phase predictability 
in the k-th frequency bin is calculated according to Pk = (^t - |9(k)|) /we [0,1], where H 
denotes the absolute value. 

Hence, the above measure Pk yields a value between 0 and 1 corresponding to 
the amount of phase-predictability in the k-th frequency bin. If Pk is close to 1, the underlying 
signal may be assumed to have a high degree of tonality, i.e. has a substantially sinusoidal 
waveform. For such a signal, phase jximps are easily perceivable, e.g. by the listener of an 
audio signal. Hence, phase jxmips should preferably be removed in this case. On the other 
hand, if the value of Pk is close to 0, the vmderlying signal may be assumed to be noisy. For 
noisy signals phase jumps are not easily perceived and may, therefore, be allowed. 

Accordingly, the phase limiting function is applied if Pk exceeds a 
predetemiined threshold, i.e. Pk > A, resulting in the actual filter response F'(k,n) according 
to 



^ If- (k. n - 1) . ei P[^ Wl otherwise 



Here, A is limited by the upper and lower boundaries of P which are +1 and 0, 
respectively. The exact value of A depends on the actual implementation. For example, A 
may be selected between 0.6 and 0.9. 

It is understood that, alternatively, any other suitable measure for estimating 
the tonality may be xised. In yet another embodiment, the allowed phase jump c described 
above may be made dependant on a suitable measure of tonality, e.g. the measure Pk above, 
thereby allowing for larger phase jumps if Pk is large and vice versa. 

Fig. 4 illustrates a decorrelator for use in the synthesizing of the audio signal. 
The decorrelator comprises an all-pass filter 401 receiving the monoaural signal x and a set 
of spatial parameters P including the interchannel cross-correlation r and a parameter 
indicative of the channel difference c. It is noted that the parameter c is related to the 
interchannel level difference by ELD = k-log(c), where k is a constant, i.e. ILD is proportional 
to the logaritimi of c. 

Preferably, the all-pass filter comprises a frequency-dependant delay providing 
a relatively smaller delay at high frequ^cies than at low frequencies. This may be achieved 
by replacing a fixed-delay of the all-pass filter with an all-pass filter comprising one period 
of a Schroeder-phase complex (see e.g. M.R. Schroeder, "Synthesis of low-peak-factor 
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signals and binary sequences with low autocorrelation", IEEE Transact. Inf. Theor., 16:85- 
89, 1970). The decorrelator further comprises an analysis circuit 402 that receives the spatial 
parameters from the decoder and extracts the interchannel cross-correlation r and the channel 
diJSerence c. The circuit 402 determines a mixing matrix M(a,P) as will be described below. 
The components of the mixing matrix are fed into a transformation circuit 403 which further 
receives the input signal x and the filtered signal H®x. The circuit 403 performs a mixing 
operation according to 



(?) 



resulting in the output signals L and R. 

The correlation between the signals L and R may be expressed as an angle a 
between vectors representing the L and R signal, respectively, in a space spanned by the 
signals x and H®x, according to r=cos(a). Consequentiy, any pair of vectors that exhibits the 
1 5 correct angular distance has the ^ecified correlation. 

Hence, a mixing matrix M which transforms the signals x and H<S^ into 
signals L and R with a predetermined correlation r may be expressed as. follows: 

l^_rcos(a/2) sin(a/2)^ (4) 
20 " l^cos(-a / 2) sin(-a / 2) / 

Thus, the amount of all-pass filtered signal depends on the desired correlation. 
Fiarthermore, the energy of the all-pass signal component is the same in both output channels 
(but with a 180° phase shift). 
25 It is noted that the case where the matrix M is given by 



G -■) 



(5) 



i.e. the case where a=90° corresponding to uncorrelated output signals(r=0), corresponds to a 
30 Lauridsen decorrelator. 

In order to illustrate a problem with the matrix of eqn. (5), we assimie a 
situation with an extreme amplitude panning towards the left channel, i.e. a case where a 
certain signal is present in the left channel only. We further assume that the desired 
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correlation between the outpxits is zero. In this case, the output of the left channel of the 
transformation of eqn. (3) with the mixing matrix of eqn. (5) yields L = 1 / a/2(x + H ® x) . 
Thus, the output consists of the original signal x combined with its all-passed filtered version 
H®x. 

However, this is an undesired situation, since the all-pass filter usually 
deteriorates the perceptual quality of the signal. Furthermore, the addition of the original 
signal and the filtered signal results in comb-filter effects, such as perceived coloration of the 
ouQ)ut signal. In this assumed extreme case, the best solution woxdd be that the left output 
signal consists of the input signal. This way the correlation of the two output signals would 
still be zero. 

In situations with more moderate level differences, the preferred situation is 
that the louder ou^ut channel contains relatively more of the original signal, and the softer 
output chaimel contains relatively more of the filtered signal. Hence, in general, it is 
preferred to maximize the amount of the original signal present in the two outputs together, 
and to minimize the amount of tiie filtered signal. 

According to this embodiment, this is achieved by introducing a different 
miving matrix including an additional common rotation: 



M = C 



rcos(P + a/2) sin(p 
\cos(p-a/2) sin(p 



-a/2)) 



(6) 



Here p is an additional rotation, and C is a scaling matrix which ensures that 
the relative level diGference between the output signals equals c, i.e. 



c = 



1 + 0 

0 



0 

1 



1 + cy 



Inserting the matrix of eqn. (6) in eqn. (3) 3delds the output signals generated 
by the matrixing operation according to this embodiment: 



30 



1+c 
0 



0 

1 



1 + cy 



('cosO + a/2) sin(p + a/2)"^ ( x 
\cos(p-a/2) sin(p-a/2)J i^H®xJ 
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Hence, the output signals L and R still have an angular difference a, i.e. the 
correlation between the L and R signals is not affected by the scaling of the signals L and R 
according to the desired level difference and the additional rotation by the angle p of both the 
L and the R signal. 

As mentioned above, preferably, the amount of the original signal x in the 
summed output of L and R shoiild be maxintiized. This condition may be used to determine 
the angle according to 

Sx 

which yields the condition: 

tan(p) = ^^tan(a/2). 
1 + c 

In summary, this application describes a psycho-acoustically motivated, 
parametric description of the spatial attributes of multichannel audio signals. This parametric 
description allows strong bitrate reductions in audio coders, since only one monaural signal 
has to be transmitted, combined with (quantized) parameters which describe the spatial 
properties of the signal. The decoder can form the original amount of audio channels by 
applying the spatial parameters. For near-CD-quality stereo audio, a bitmte associated with 
these spatial parameters of 10 kbit/s or less seems sufficient to reproduce the correct spatial 
impression at the receiviag end. This bitrate can be scaled down further by reducing the 
spectral and/or temporal resolution of the spatial parameters and/or processing the spatial 
parameters using losless compression algorithms. 

It should be noted that the above-mentioned embodiments illustrate rather than 
limit the invention, and that those skilled in the art will be able to design many alternative 
embodiments without departing &om the scope of the appended claims. 

For example, the invention has primarily been described in connection with an 
embodiment using the two localization cues ILD and ITD/IPD. In alternative embodiments, 
other localization cues may be used. Furthermore, in one embodiment, the ELD, the ITD/IPD, 
and the interchannel cross-correladon may be determined as described above, but only the 
interchannel cross-correlation is transmitted together with the monaural signal, thereby 
furttier reducing the required bandwidth/storage capacity for transmitting/storing the audio 
signal. Alternatively, the interchamel cross-correlation and one of the ILD and ITD/TPD 
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may be transmitted. In these embodiments, the signal is synthesized jBrom the monaural signal 
on the basis of the transmitted parameters only. 

In the claims, any reference signs placed between parentheses shall not be 
construed as limiting the claim. The word "comprising" does not exclude the presence of 
elraients or steps other than those listed in a claim. The word "a" or "an" preceding an 
element does not exclude the presence of a plurality of such elements. 

The invention can be implemented by means of hardware comprising several 
distinct elements, and by means of a suitably progranmied computer. In the device claim 
enumerating several means, several of these means can be embodied by one and the same 
item of hardware. The mere fact that certain measures are recited in mutually different 
dependent claims does not indicate that a combination of these measures cannot be used to 
advantage. 
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CLAIMS: 



1 . A method of coding an audio signal, the method comprising: 

- generating a monaural signal comprising a combination of at least two input audio 
channels, 

determining a set of spatial parameters ladicative of spatial properties of the at least two 
5 input audio channels, the set of spatial parameters including a parameter representing a 

measure of similarity of waveforms of the at least two input audio channels, and 

- generating an encoded signal comprising the monaural signal and the set of spatial 
parameters. 

10 2. A method according to claim 1, wherein the step of determining a set of spatial 

parameters indicative of Sfpatial properties comprises detemiining a set of spatial parameters 
as a function of time and frequency. 

3 . A method according to claim 2, wherein the step of detemiining a set of spatial 
parameters indicative of spatial properties comprises 

- dividing each of the at least two input audio channels into corresponding pluralities of 
frequency bands; 

- for each of the plurality of frequency bands determining the set of spatial parameters 
indicative of spatial properties of the at least two input audio channels within the 
corresponding frequency band. 

4. A method according to any one of claims 1 through 3, wherein the set of 
spatial parameters includes at least one localization cue. 

25 5. A method according to claim 4, wherein the set of spatial parameters includes 

at least two localization cues comprising an interchannel level difference and a selected one 
of an interchannel time difference and an interchannel phase difference. 
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6. A method according to claim 4 or 5, wherein the measure of similarity 
comprises information that cannot be accounted for by the localization cues. 

7. A method according to any one of claims 1 through 6, wherein the measure of 
similarity corresponds to a value of a cross-correlation function at a maximum of said cross- 
correlation function. 

8. A method according to any one of claims 1 through 7, wherein the step of 
generating an encoded signal comprising the monaural signal and the set of spatial 
parameters comprises generating a set of quantized spatial parameters, each introducing a 
corresponding quantization error relative to the corresponding determined spatial parameter, 
wherein at least one of the introduced qxiantization errors is controlled to depend on a value 
of at least one of the determined spatial parameters. 

9. An encoder for coding an audio signal, the encoder comprising: 

- means for generating a monaural signal comprising a combination of at least two input 
audio channels, 

- means for determining a set of spatial parameters indicative of spatial properties of the at 
least two input audio channels, the set of spatial parameters including a parameter 
representing a measure of similarity of waveforms of the at least two input audio 
channels, and 

- means for generating an encoded signal comprising the monaural signal and the set of 
spatial parameters. 

10. An apparatus for supplying an audio signal, the apparatus comprising: 
an input for receiving an audio signal, 

an encoder as claimed in claim 9 for encoding the audio signal to obtain an 
encoded audio signal, and 

an output for supplying the encoded audio signal. 



11. 
and 



An encoded audio signal, the signal comprising: 

a monaural signal comprising a combination of at least two audio charmels. 
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a set of spatial parameters indicative of spatial properties of the at least two 
input audio chatmels, the set of spatial parameters including a parameter representing a 
measure of similarity of waveforms of the at least two input audio channels. 

12. A storage medium having stored thereon an encoded signal as claimed in 
claim 11. 

13. A method of decoding an encoded audio signal, the method comprising: 
obtaining a monaural signal from the encoded audio signal, the monaural 

signal comprising a combination of at least two audio channels, 

obtaining a set of spatial parameters from the encoded audio signal, the set of 
spatial parameters including a parameter representing a measure of similarity of waveforms 
of the at least two audio chaimels, and 

generating a multi-channel output signal from the monaural signal and the 
spatial parameters. 

14, A decoder for decoding an encoded audio signal, the decoder comprising 
means for obtaining a monaural signal from the encoded audio signal, the 

monaural signal comprising a combination of at least two audio channels, and 

means for obtaining a set of spatial parameters from the encoded audio signal, 

the set of spatial parameters including a parameter representing a measure of similarity of 

waveforms of the at least two audio channels, and 

means for generating a multi-channel output signal from the monaural signal 

and the spatial parameters. 

15, An apparatus for supplying a decoded audio signal, the apparatus comprising: 
an input for receiving an encoded audio signal, 

a decoder as claimed in claim 14 for decoding the encoded audio signal to 
obtain a multi-channel output signal, 

an ou^ut for supplying or reproducing the multi-chaimel output signal. 
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